Goto

Collaborating Authors

 event mention


Argument-Centric Causal Intervention Method for Mitigating Bias in Cross-Document Event Coreference Resolution

Yao, Long, Yang, Wenzhong, Yin, Yabo, Wei, Fuyuan, Lv, Hongzhen, Peng, Jiaren, Wang, Liejun, Tao, Xiaoming

arXiv.org Artificial Intelligence

Cross-document Event Coreference Resolution (CD-ECR) is a fundamental task in natural language processing (NLP) that seeks to determine whether event mentions across multiple documents refer to the same real-world occurrence. However, current CD-ECR approaches predominantly rely on trigger features within input mention pairs, which induce spurious correlations between surface-level lexical features and coreference relationships, impairing the overall performance of the models. To address this issue, we propose a novel cross-document event coreference resolution method based on Argument-Centric Causal Intervention (ACCI). Specifically, we construct a structural causal graph to uncover confounding dependencies between lexical triggers and coreference labels, and introduce backdoor-adjusted interventions to isolate the true causal effect of argument semantics. To further mitigate spurious correlations, ACCI integrates a counterfactual reasoning module that quantifies the causal influence of trigger word perturbations, and an argument-aware enhancement module to promote greater sensitivity to semantically grounded information. In contrast to prior methods that depend on costly data augmentation or heuristic-based filtering, ACCI enables effective debiasing in a unified end-to-end framework without altering the underlying training procedure. Extensive experiments demonstrate that ACCI achieves CoNLL F1 of 88.4% on ECB+ and 85.2% on GVC, achieving state-of-the-art performance. The implementation and materials are available at https://github.com/era211/ACCI.


LegalCore: A Dataset for Legal Documents Event Coreference Resolution

Wei, Kangda, Shi, Xi, Tong, Jonathan, Reddy, Sai Ramana, Natarajan, Anandhavelu, Jain, Rajiv, Garimella, Aparna, Huang, Ruihong

arXiv.org Artificial Intelligence

Recognizing events and their coreferential mentions in a document is essential for understanding semantic meanings of text. The existing research on event coreference resolution is mostly limited to news articles. In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference information. The legal contract documents we annotated in this dataset are several times longer than news articles, with an average length of around 25k tokens per document. The annotations show that legal documents have dense event mentions and feature both short-distance and super long-distance coreference links between event mentions. We further benchmark mainstream Large Language Models (LLMs) on this dataset for both event detection and event coreference resolution tasks, and find that this dataset poses significant challenges for state-of-the-art open-source and proprietary LLMs, which perform significantly worse than a supervised baseline. We will publish the dataset as well as the code.


ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning

Vo, Vy, Qu, Lizhen, Feng, Tao, Hua, Yuncheng, Kang, Xiaoxi, Fan, Songhai, Dwyer, Tim, Soon, Lay-Ki, Haffari, Gholamreza

arXiv.org Artificial Intelligence

Identifying cause-and-effect relationships is critical to understanding real-world dynamics and ultimately causal reasoning. Existing methods for identifying event causality in NLP, including those based on Large Language Models (LLMs), exhibit difficulties in out-of-distribution settings due to the limited scale and heavy reliance on lexical cues within available benchmarks. Modern benchmarks, inspired by probabilistic causal inference, have attempted to construct causal graphs of events as a robust representation of causal knowledge, where \texttt{CRAB} \citep{romanou2023crab} is one such recent benchmark along this line. In this paper, we introduce \texttt{ACCESS}, a benchmark designed for discovery and reasoning over abstract causal events. Unlike existing resources, \texttt{ACCESS} focuses on causality of everyday life events on the abstraction level. We propose a pipeline for identifying abstractions for event generalizations from \texttt{GLUCOSE} \citep{mostafazadeh-etal-2020-glucose}, a large-scale dataset of implicit commonsense causal knowledge, from which we subsequently extract $1,4$K causal pairs. Our experiments highlight the ongoing challenges of using statistical methods and/or LLMs for automatic abstraction identification and causal discovery in NLP. Nonetheless, we demonstrate that the abstract causal knowledge provided in \texttt{ACCESS} can be leveraged for enhancing QA reasoning performance in LLMs.


MAQInstruct: Instruction-based Unified Event Relation Extraction

Xu, Jun, Sun, Mengshu, Zhang, Zhiqiang, Zhou, Jun

arXiv.org Artificial Intelligence

Extracting event relations that deviate from known schemas has proven challenging for previous methods based on multi-class classification, MASK prediction, or prototype matching. Recent advancements in large language models have shown impressive performance through instruction tuning. Nevertheless, in the task of event relation extraction, instruction-based methods face several challenges: there are a vast number of inference samples, and the relations between events are non-sequential. To tackle these challenges, we present an improved instruction-based event relation extraction framework named MAQInstruct. Firstly, we transform the task from extracting event relations using given event-event instructions to selecting events using given event-relation instructions, which reduces the number of samples required for inference. Then, by incorporating a bipartite matching loss, we reduce the dependency of the instruction-based method on the generation sequence. Our experimental results demonstrate that MAQInstruct significantly improves the performance of event relation extraction across multiple LLMs.


EventFull: Complete and Consistent Event Relation Annotation

Eirew, Alon, Nachshoni, Eviatar, Slobodkin, Aviv, Dagan, Ido

arXiv.org Artificial Intelligence

MEANTIME (Minard et al., 2016), and EventStoryLine Identifying the semantic relations between events (Caselli and Vossen, 2017) restrict event mentioned in a text, notably temporal, causal and pairs to a span of two consecutive sentences. This coreference relations, has been a fundamental goal limitation inherently prevents testing and training in NLP. Substantial efforts have been devoted to developing models on longer-range relations. Other datasets, various datasets that capture some or all of such as TimeBank (Pustejovsky et al., 2003b) and these relations (O'Gorman et al., 2016; Hong et al., MAVEN-ERE (Wang et al., 2022), did not publish 2016; Wang et al., 2022). These datasets were then a systematic annotation execution protocol that leveraged to develop and to evaluate corresponding guarantees actual complete annotation, and were models for detecting event-event relations (Hu subsequently criticized for being incomplete in et al., 2023; Guan et al., 2024). The output of such their relation annotation (Pustejovsky and Stubbs, models has been utilized in a range of downstream 2011; Rogers et al., 2024). Further, some researchers applications, with recent examples including event aimed to avoid the cost of manual annotation forecasting (Ma et al., 2023), misinformation detection altogether and employed fully-or partlyautomatic (Lei and Huang, 2023), and treatment timeline dataset creation methods (Mirza et al., extraction (Yao et al., 2024), among others.


Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

Wang, Zimu, Xia, Lei, Wang, Wei, Du, Xinya

arXiv.org Artificial Intelligence

As an essential task in information extraction (IE), Event-Event Causal Relation Extraction (ECRE) aims to identify and classify the causal relationships between event mentions in natural language texts. However, existing research on ECRE has highlighted two critical challenges, including the lack of document-level modeling and causal hallucinations. In this paper, we propose a Knowledge-guided binary Question Answering (KnowQA) method with event structures for ECRE, consisting of two stages: Event Structure Construction and Binary Question Answering. We conduct extensive experiments under both zero-shot and fine-tuning settings with large language models (LLMs) on the MECI and MAVEN-ERE datasets. Experimental results demonstrate the usefulness of event structures on document-level ECRE and the effectiveness of KnowQA by achieving state-of-the-art on the MECI dataset. We observe not only the effectiveness but also the high generalizability and low inconsistency of our method, particularly when with complete event structures after fine-tuning the models.


Event Extraction for Portuguese: A QA-driven Approach using ACE-2005

Cunha, Luís Filipe, Campos, Ricardo, Jorge, Alípio

arXiv.org Artificial Intelligence

Event extraction is an Information Retrieval task that commonly consists of identifying the central word for the event (trigger) and the event's arguments. This task has been extensively studied for English but lags behind for Portuguese, partly due to the lack of task-specific annotated corpora. This paper proposes a framework in which two separated BERT-based models were fine-tuned to identify and classify events in Portuguese documents. We decompose this task into two sub-tasks. Firstly, we use a token classification model to detect event triggers. To extract event arguments, we train a Question Answering model that queries the triggers about their corresponding event argument roles. Given the lack of event annotated corpora in Portuguese, we translated the original version of the ACE-2005 dataset (a reference in the field) into Portuguese, producing a new corpus for Portuguese event extraction. To accomplish this, we developed an automatic translation pipeline. Our framework obtains F1 marks of 64.4 for trigger classification and 46.7 for argument classification setting, thus a new state-of-the-art reference for these tasks in Portuguese.


Are LLMs Good Annotators for Discourse-level Event Relation Extraction?

Wei, Kangda, Gautam, Aayush, Huang, Ruihong

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated proficiency in a wide array of natural language processing tasks. However, its effectiveness over discourse-level event relation extraction (ERE) tasks remains unexplored. In this paper, we assess the effectiveness of LLMs in addressing discourse-level ERE tasks characterized by lengthy documents and intricate relations encompassing coreference, temporal, causal, and subevent types. Evaluation is conducted using an commercial model, GPT-3.5, and an open-source model, LLaMA-2. Our study reveals a notable underperformance of LLMs compared to the baseline established through supervised learning. Although Supervised Fine-Tuning (SFT) can improve LLMs performance, it does not scale well compared to the smaller supervised baseline model. Our quantitative and qualitative analysis shows that LLMs have several weaknesses when applied for extracting event relations, including a tendency to fabricate event mentions, and failures to capture transitivity rules among relations, detect long distance relations, or comprehend contexts with dense event mentions.


Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

Gao, Qiang, Meng, Zixiang, Li, Bobo, Zhou, Jun, Li, Fei, Teng, Chong, Ji, Donghong

arXiv.org Artificial Intelligence

Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events. We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events, where over 70% of them are cross-document. To build a benchmark, we propose a CDEE pipeline that includes 5 steps, namely event extraction, coreference resolution, entity normalization, role normalization and entity-role resolution. Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task. Our work builds a new line of information extraction research and will attract new research attention.


Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Gao, Qiang, Li, Bobo, Meng, Zixiang, Li, Yunlong, Zhou, Jun, Li, Fei, Teng, Chong, Ji, Donghong

arXiv.org Artificial Intelligence

Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.